spec-047 §4.9: ARM64 post-Phase-4 micro perf capture (LAPTOP-4MEP83VI)#465
Merged
Conversation
Indicative M1–M13 capture on the §4.9 baseline box, ARM64-native/Release/.NET 10.0.8, reps=5, iters matched to the 2026-05-25 baseline (M1–M8 @5000, M9 @2000, M10–M13 @1000). Adds raw JSONL, the aggregator-out tables, RESULTS.md (cross- baseline comparison), and analyze.py. Allocation (deterministic, valid — Direct alloc matches baseline byte-for-byte): - §15.6 "M1–M3 alloc ≤ Today": M2 −5%, M3 −6% PASS; M1 +20% FAIL. - §11.6 byte gates: M3 PASS; M1 (3.2×) and M2 (2.4×) over target. - vs baseline ReactorV2: most benches flat/better (M9 −41% standout); M1 +20% and M12 +17% are real, deterministic regressions to investigate. Confirms the KD-3 trigger (M1 over budget). NOT a ratification sign-off: §15.5 isolation (AC/High-Perf/DRR/foreground) was not enforced, so the timing axis is environment-throttled (Direct ns +60–140% vs baseline) and must be disregarded; the §4.9 randomized/interleaved ordering + CPU-clock telemetry is not wired; and the macro suite (L1–L14) is unrunnable (its projects were deleted in Phase 4). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#465) Updates the spec body + both trackers to reflect the indicative LAPTOP-4MEP83VI capture now that the deterministic allocation axis is measured: - §4.4/§11.6 byte gates MEASURED: M3 PASS; M1 (3.2×) + M2 (2.4×) FAIL. - §15.6 "M1–M3 alloc ≤ Today": M2/M3 PASS, M1 +20.3% FAIL; M12 +17% regressed. - KD-3 trigger CONFIRMED (M1 over budget) — fold warranted + investigate the bucketing regression. - Gate stays OPEN: timing axis throttled (no §15.5 isolation), macro suite unrunnable (projects deleted in Phase 4); needs an isolated re-capture + the M1/M12 alloc fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Captures the post-Phase-4 (V1-default) micro perf state of
PerfBench.ControlModel(M1–M13) onLAPTOP-4MEP83VI— the exact ARM64 baseline box that spec-047 §4.9 was blocked on — and compares it against the original2026-05-25-arm64baseline already in the repo.Adds under
docs/specs/047/phase4-results/LAPTOP-4MEP83VI/2026-05-29-arm64/:perfbench-controlmodel-{m1-m8,m9,m10-m13}.jsonl)aggregator-out/canonical tables (matches the baseline dir shape)RESULTS.md— the cross-baseline interpretationanalyze.py— reproducible per-render comparisonRun params match the baseline exactly: ARM64-native / Release / .NET 10.0.8, reps=5, iters M1–M8 @5000 / M9 @2000 / M10–M13 @1000. 195 rows, 0 errors, 0 excluded by the §15.5 env-metadata gate.
Two interpretation notes
ReactorToday≡Reactornow — §4.5 deleted the legacy dispatch switch, so the harness's "Today" variant runs the same V1 path. The real comparison is currentReactorvs the baseline'sReactorV2/ReactorTodaycolumns (computed inRESULTS.md/analyze.py).Directalloc matches baseline byte-for-byte). The ns axis is environment-throttled —Direct(zero Reactor code) is +60–140% slower than the baseline run — so cross-baseline timing is disregarded.Allocation findings (the valid axis)
⛔ Not a §4.9 ratification sign-off
StressPerf.ReactorV2,BlankReactorV2).A real ratification needs an isolated stable-AC re-capture + the macro suite rebuilt against the single
Reactorvariant. This PR is the data + analysis, not the gate close.🤖 Generated with Claude Code